[SOUND]
So here are some specific examples of what
we can't do today and
part of speech tagging is still
not easy to do 100% correctly.
So in the example, he turned off the
highway verses he turned off the fan and
the two offs actually have somewhat
a differentness in their active
categories and also its very difficult
to get a complete the parsing correct.
Again, the example, a man saw a boy
with a telescope can actually
be very difficult to parse
depending on the context.
Precise deep semantic
analysis is also very hard.
For example, to define the meaning of own,
precisely is very difficult in
the sentence, like John owns a restaurant.
So the state of the off can
be summarized as follows.
Robust and
general NLP tends to be shallow while
a deep understanding does not scale up.
For this reason in this course,
the techniques that we cover are in
general, shallow techniques for
analyzing text data and
mining text data and they are generally
based on statistical analysis.
So there are robust and
general and they are in
the in category of shallow analysis.
So such techniques have
the advantage of being able to be
applied to any text data in
any natural about any topic.
But the downside is that, they don't
give use a deeper understanding of text.
For that, we have to rely on
deeper natural language analysis.
That typically would require
a human effort to annotate
a lot of examples of analysis that would
like to do and then computers can use
machine learning techniques and learn from
these training examples to do the task.
So in practical applications, we generally
combine the two kinds of techniques
with the general statistical and
methods as a backbone as the basis.
These can be applied to any text data.
And on top of that, we're going to use
humans to, and you take more data and
to use supervised machine learning
to do some tasks as well as we can,
especially for those important
tasks to bring humans into the loop
to analyze text data more precisely.
But this course will cover
the general statistical approaches
that generally,
don't require much human effort.
So they're practically,
more useful that some of the deeper
analysis techniques that require a lot of
human effort to annotate the text today.
So to summarize,
the main points we take are first NLP
is the foundation for text mining.
So obviously, the better we
can understand the text data,
the better we can do text mining.
Computers today are far from being able
to understand the natural language.
Deep NLP requires common sense
knowledge and inferences.
Thus, only working for
very limited domains not feasible for
large scale text mining.
Shallow NLP based on statistical
methods can be done in large scale and
is the main topic of this course and
they are generally applicable
to a lot of applications.
They are in some sense also,
more useful techniques.
In practice,
we use statistical NLP as the basis and
we'll have humans for
help as needed in various ways.
[MUSIC]

